Optimisation of a Brownian dynamics algorithm for semidilute polymer solutions 
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Simulating the static and dynamic properties of semidilute polymer solutions with Brownian dy- 
namics (BD) requires the computation of a large system of polymer chains coupled to one another 
through excluded- volume and hydrodynamic interactions. In the presence of periodic boundary con- 
ditions, long-ranged hydrodynamic interactions are frequently summed with the Ewald summation 
technique. By performing detailed simulations that shed light on the influence of several tuning pa- 
rameters involved both in the Ewald summation method, and in the efficient treatment of Brownian 
forces, we develop a BD algorithm in which the computational cost scales as 0{N^' ), where A*' is 
the number of monomers in the simulation box. We show that Beenakker's original implementation 
of the Ewald sum, which is only valid for systems without bead overlap, can be modified so that 
^-solutions can be simulated by switching off excluded-volume interactions. A comparison of the 
predictions of the radius of gyration, the end-to-end vector, and the self-diffusion coefficient by BD, 
at a range of concentrations, with the hybrid Lattice Boltzmann/Molecular Dynamics (LB/MD) 
method shows excellent agreement between the two methods. In contrast to the situation for dilute 
solutions, the LB/MD method is shown to be significantly more computationally efficient than the 
current implementation of BD for simulating semidilute solutions. We argue however that further 
optimisations should be possible. 
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I. INTRODUCTION 

Understanding the behaviour of polymer solutions in 
the semidilute regime of concentration is important both 
from the point of view of applications, and from the 
point of view of advancing fundamental knowledge in 
polymer science. Until recently, insight into semidilute 
polymer solution behaviour was largely obtained through 
approximate analytical and scaling theories [1-8]. How- 
ever, significant progress in the development of meso- 
scopic simulation techniques [9-11], which allow the ex- 
ploitation of underlying theories without the need for ap- 
proximations, has made it possible for the first time to 
obtain detailed predictions of equilibrium and nonequi- 
librium properties that can be compared with experi- 
mental observations. The successful implementation of 
mesoscopic simulations has been made possible through 
the use of algorithms that enable an accurate depiction 
of the semidilute regime. Essentially, this requires the 
ability to describe long polymers that overlap with each 
other, while maintaining a low segment density. Further, 
the segments must be capable of interacting with each 
other through solvent-mediated hydrodynamic interac- 
tions [2, 9, 12-14]. Three different mesoscopic simula- 
tion methods, all of which use coarse-grained bead-spring 
chain models for polymer molecules, have been developed 
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recently that achieve these objectives. Two of these tech- 
niques, namely, the hybrid Lattice Boltzmann/Molecular 
Dynamics (LB/MD) method [15, 16] and the hybrid 
Multiparticle Collision Dynamics/Molecular Dynamics 
(MPCD) method [17-19] treat the solvent exphcitly. As 
a consequence, hydrodynamic interactions between poly- 
mer segments arise naturally through the exchange of 
momentum between the beads on a chain and solvent 
molecules. In the third approach [10], which is Brown- 
ian dynamics (BD) simulations [20], the solvent degrees 
of freedom are removed completely, but their effect is 
taken into account through long-range dynamic correla- 
tions in the stochastic displacements of the beads. The 
very nature of semidilute polymer solutions, particularly 
the need to use periodic boundary conditions to describe 
homogeneous polymer solutions in unbounded domains, 
necessitates the simulation of a large number of particles. 
As a result, the computational efficiency of a simulation 
technique becomes an important consideration. 

To our knowledge, there has been no systematic in- 
vestigation to compare the performance of the different 
techniques in terms of their computational efficiency in 
the semidilute regime. Recently, however, a quantita- 
tive comparison of the predictions of the explicit sol- 
vent LB/MD method with the predictions of the implicit 
solvent BD method for the dynamics of a single chain 
in a solvent, i.e. in the dilute regime, has been carried 
out with a view to compare their computational efficien- 
cies [21, 22]. It was shown by Pham et at [21] that in or- 
der to observe the system for the same time span in phys- 
ical units, significantly less CPU time is required with BD 



in comparison to LB/MD, for bead-spring chains with 
Ni, < 10®, where Ni, is the number of beads per chain. 
The situation, however, is expected to be quite differ- 
ent in the semidilute regime. For the LB/MD method, 
the CPU cost scales hnearly with the number of parti- 
cles, which implies that the CPU cost grows as L^ since 
the solvent particles (the calculation of whose dynamics 
dominates the CPU cost), are distributed on lattice grid 
points in a simulation box of size L. In order to prevent a 
chain from wrapping over itself due to spatial restriction 
and hence altering its static conformation, it is necessary 
to ensure that L > 2yJ [Rf), where (-Re) i^ the mean- 
square end-to-end distance of the chain. In the dilute 
case, this leads to the CPU time scaling as N^'^ for the 
LB/MD method, where v is the Flory exponent. Using 
a simple scaling argument based on the blob picture of 
semidilute solutions [3], Pham et al. [21] suggest that the 
CPU effort is even somewhat decreased in semidilute so- 
lutions due to the shrinkage of the chains resulting from 
the screening of excluded-volume interactions [3, 23] — 
note that this argument is based upon considering the 
smallest possible system size. 

In the case of BD, even though the number of de- 
grees of freedom is significantly reduced by eliminating 
the solvent, implementation of pairwise hydrodynamic 
interactions between segments proves to be extremely 
computationally expensive. For dilute polymer solu- 
tions, the computational cost of evaluating intramolec- 
ular hydrodynamic interactions arises from the need to 
carry out a decomposition of the diffusion tensor that 
appears in the stochastic equation of motion. A straight- 
forward Cholesky decomposition leads to an algorithm 
that scales like 0{N^). Many current implementations 
of single chain BD simulations, however, mitigate this 
large CPU cost by using Fixman's polynomial approxi- 
mation to this decomposition, which leads to 0{N^'^^) 
scaling [24-27]. In the case of semidilute polymer solu- 
tions, both intramolecular and intermolecular hydrody- 
namic interactions must be taken into account. The use 
of periodic boundary conditions to imitate bulk systems 
necessitates the evaluation of hydrodynamic interactions 
not only between one particular chosen segment and all 
other segments within the primary simulation box, but 
also with all the other segments in all the periodic images 
of the box. Because of the long-range nature of hydro- 
dynamic interactions, which decay only reciprocally with 
distance, this sum converges very slowly and only condi- 
tionally [28, 29]. Inspired by its earlier success in sum- 
ming electrostatic interaction (which are also long-ranged 
in nature) , the problem of slow convergence has been re- 
solved through the use of the Ewald summation tech- 
nique [30-34] — both in the context of BD simulations 
of colloidal suspensions where hydrodynamic interactions 
between particles with a finite radius must be taken into 
account [35-39], and in the context of BD simulations 
of semidilute polymer solutions where the polymer seg- 
ments are assumed to be point particles [10]. 

Rapid convergence is achieved in the Ewald method 



by splitting the original expression into two sums, one of 
them in real space and the other in reciprocal space, both 
of which converge exponentially. A straightforward im- 
plementation of the Ewald sum, however, is computation- 
ally demanding, scaling like 0{N^), where N = NcX N^, 
is the total number of beads in the primary simulation 
box with Nc polymer chains. Interestingly, by a suitable 
choice of a parameter a in the Ewald sum that tunes 
the relative weights of the real space and reciprocal space 
contributions (consequently splitting the load of calculat- 
ing the total sum between the real space and reciprocal 
space sums), it is possible to make the computational cost 
of calculating either the real space or the reciprocal space 
sum scale like 0{N'^), while the remaining sum scales as 
0{N). In their recent simulation of semidilute polymer 
solutions, Stoltz et al. [10] have implemented a BD algo- 
rithm that leads to the real space sum scaling like 0{N'^). 
In the case of colloidal suspensions, accelerated BD al- 
gorithms have been developed by Brady and co-workers 
with the Ewald sum scaling like 0(A''logA^) [38, 39]. 
The essential idea is to retain an 0{N) scaling for the 
real space sum, while reducing the complexity of the re- 
ciprocal part of the Ewald sum to 0(A^log7V) with the 
help of Fast Fourier Transformation. For confined sys- 
tems which are non-periodic, and in which methods based 
on Fourier transforms are not applicable, the Wisconsin 
group have recently successfully introduced a BD simu- 
lation technique they term the "general geometry Ewald- 
like method", which achieves O(A^logA^) scaling [40]. 
Analogous to the Ewald method, the technique is based 
on splitting the solution to the Stokes equation into sin- 
gular short-ranged parts and smooth long-ranged parts. 
Thus, even though a detailed quantitative comparison 
of all the currently available mesoscopic simulation tech- 
niques is yet to be carried out, they all appear to scale, in 
their most efficient versions, roughly linearly with system 
size. 

In the context of electrostatic interactions, two differ- 
ent classes of schemes have been proposed for the op- 
timisation of the Ewald sum [31, 34, 41, 42]. One of 
these classes (on which the accelerated BD schemes are 
modelled), achieves 0(A^log7V) scaling by assigning par- 
ticles to a mesh and then using Fast Fourier Transform 
techniques to evaluate the reciprocal-space part of the 
Ewald sum on this mesh [32, 34]. The other class [41, 42] 
achieves 0{N^-^) scaling by balancing the computational 
cost of evaluating the real space and reciprocal space 
sums, i.e. by an optimal choice of the aforementioned 
parameter a. To our knowledge, the latter approach has 
so far not been trialled for summing hydrodynamic in- 
teractions. 

In the context of hydrodynamic interactions, it is also 
worth noting that the BD simulation of semidilute poly- 
mer solutions carried out by Stoltz et al. [10] differs from 
BD simulations of colloidal suspensions [36, 39] in the 
procedure adopted for the calculation of far-field hydro- 
dynamic interactions, even though both are based on the 
Ewald summation technique. While the latter are based 



on Hasimoto's solution of the Stokes equation for flow 
past a periodic array of point forces [28], i.e. on the Ewald 
sum of the Oseen-Burgers tensor, the former is based on 
Bcenakker's solution [29], which is the Ewald sum of the 
Rotne-Prager-Yamakawa (RPY) tensor. The RPY ten- 
sor [20, 43, 44] is a generalisation of the Oseen-Burgers 
tensor in two aspects: Firstly, it approximately takes into 
account the finite particle radius by representing the far- 
field hydrodynamic flow field up to quadrupolar order 
of its multipole expansion [45] (Oseen-Burgers is just a 
monopole field), and, secondly, it regularises the singu- 
larity that occurs for small inter-bead distances. Such 
a regularisation is necessary for BD simulations that al- 
low for configurations with overlapping beads, since oth- 
erwise the diffusion tensor would not always be positive- 
definite, implying a violation of the second law of thermo- 
dynamics. The problem can be avoided by introducing 
sufficiently strong excluded-volume interactions, which 
suppress the occurrence of such configurations; this was 
the approach taken by Stoltz et al. [10]. By this pro- 
cedure, they also avoided the problem that Beenakker's 
formulae [29] are applicable only to the far-field branch 
of the RPY tensor and not to the regularised near-field 
branch. However, one can anticipate that bead over- 
lap will occur in simulations of semidilute 0-solutions, 
since ^-solutions are commonly simulated by switching 
off excluded-volume interactions [27, 46]. We therefore 
develop in the present paper a method that is able to 
deal with overlaps, by implementing the Ewald sum of 
both branches of the RPY tensor. It should be noted that 
the near-field RPY formula is not the only possible reg- 
ularisation that has been discussed in the literature; an 
alternative was suggested in Ref. 47. While the details 
of the regularisation are unlikely to significantly affect 
the dynamic properties of semidilute polymer solutions, 
we have focused on the RPY formula, since, according 
to our knowledge, it is the only known regularisation 
that provenly provides positive-definiteness for all chain 
lengths and configurations [20] . For a physical motivation 
of the RPY regularisation, see Ref. [43]. In the context 
of BD simulations of colloidal suspensions, the problem 
of positive-definiteness does not arise due to excluded- 
volume interactions, while near-field hydrodynamic in- 
teractions are taken into account through short-range lu- 
brication forces [36]. 

In this paper, four aspects of the implementation and 
optimisation of BD simulations for semidilute polymer 
solutions are considered: (i) The development of an al- 
gorithm that scales like 0{N^-^) for calculating pair-wise 
hydrodynamic interactions in periodic systems; (ii) The 
derivation of a modified version of Beenakker's periodic 
RPY tensor that is applicable to simulations of solutions 
under conditions; (iii) The optimal implementation of 
Fixman's polynomial approximation to the decomposi- 
tion of the diffusion tensor, within the context of the cur- 
rent BD simulation algorithm and (iv) Optimisation of 
the overall algorithm for a single Euler time step. Finally, 
the resulting optimised BD algorithm has been used to 



calculate a variety of equilibrium properties of semidilute 
polymer solutions, across a range of concentrations, and 
compared quantitatively with the results of the LB/MD 
algorithm, along with a comparison of the CPU time scal- 
ing for both these approaches. 

The plan of the paper is as follows. In Sec. II, the gov- 
erning equations, along with the implementation of the 
Ewald sum and its modification to handle overlapping 
beads, are discussed. Sections III, IV, and V consider the 
optimisation of (i) the Ewald sum for hydrodynamic in- 
teractions, (ii) the Chebychev polynomial approximation 
for the decomposition of the diffusion tensor, and (iii) 
the execution of a single Euler time step, respectively. 
The optimised BD algorithm is validated by a variety of 
different means, under both 6 and good solvent condi- 
tions, in Sec. VI, and in Sec. VII, its computational cost 
is compared with that of the LB/MD method at a con- 
centration that lies in the semidilute regime. Finally, the 
principal conclusions are summarised in Sec. VIII. 



II. MODEL AND SIMULATION METHOD 

A. Governing equation 

A linear bead-spring chain model is used to repre- 
sent polymers at the mesoscopic level, with each poly- 
mer chain coarse-grained into a sequence of Ni, beads, 
which act as centres of hydrodynamic resistance, con- 
nected by A*";, — 1 massless springs that represent the en- 
tropic force between adjacent beads. A semidilute poly- 
mer solution is modelled as an ensemble of such bead- 
spring chains, immersed in an incompressible Newtonian 
solvent. A total of Nc chains are initially enclosed in a 
cubic and periodic ceU of edge length L, giving a total of 
N = Ni, X Nc beads per cell at a bulk monomer concen- 
tration of c = N/V, where V ~ L^ is the volume of the 
simulation cell. Using the length scale Ih = x/ksT/H 
and time scale Xh = C/4-ff , where fc^ is the Boltzmann's 
constant, T is the temperature, H is the spring constant 
and C is the hydrodynamic friction coefficient associated 
with a bead, the Euler integration algorithm for the non- 
dimensional Ito stochastic differential equation governing 
the position vector r^(i) of bead i' at time t, is [10] 
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v,{t + At) = r,(t) + [k ■ r,(t)] + ^ ^ [D.^ W ' F^W 
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Here, the 3x3 tensor k is equal to (Vv) , with v be- 
ing the unperturbed solvent velocity. The dimensionless 
diffusion tensor D^^ is a 3 x 3 matrix for a fixed pair 
of beads /i and v. It is related to the hydrodynamic in- 
teraction tensor, as discussed further subsequently. F^ 
incorporates all the non-hydrodynamic forces on bead /j. 



due to all the other beads. The components of the Gaus- 
sian noise AW^ are obtained from a real-valued Gaus- 
sian distribution with zero mean and variance At. The 
quantity B^^ is a non-dimensional tensor whose presence 
leads to multiplicative noise [20]. Its evaluation requires 
the decomposition of the diffusion tensor. Defining the 
matrices X> and 3 as block matrices consisting oi N x N 
blocks each having dimensions of 3 x 3, with the (v, ii)- 
th block of "D containing the components of the diffusion 
tensor D,^^ , and the corresponding block of B being equal 
to By^, the decomposition rule for obtaining B can be ex- 
pressed as 



BB^ = -D 



(2) 



The non-hydrodynamic forces in the model are com- 
prised of the spring forces F^^"^ and excluded- volume in- 
teraction forces F;^^, i.e., F^ = F^p' + F"''^. A hnear 
Hookean spring potential is used here for modelling the 
spring forces when considering the optimisation of the 
Ewald sum, while a finitely extensible nonlinear elastic 
(FENE) potential has been used while comparing results 
with the Lattice Boltzmann method. The entropic spring 
force on bead /x due to adjacent beads can be expressed 
as F;,p^ = F^(Q^) - F^(Q^_i) where F^(Q^_i) is the 
force between the beads /i — 1 and /i, acting in the di- 
rection of the connector vector between the two beads 
Q ]^ = r^ — r^_i. The dimensionless Hookean spring 
force is given by F^(Q^) 

^^^^^^ ^ thqTa , , , 

mcnsionless finite extensibility parameter, and qq is the 
dimensional maximum stretch of a spring. 

The non-dimensional diffusion tensor D^^ in Eq. (1) is 
related to the non-dimensional hydrodynamic interaction 
tensor f2 through 



Q^, while for FENE springs, 
Hql/ksT is the di- 



where b 
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where 5 and 5^i, represent a unit tensor and a Kronecker 
delta, respectively, while i2 represents the effect of the 
motion of a bead ji on another bead v through the dis- 
turbances carried by the surrounding fluid. The hydro- 
dynamic interaction tensor fi is assumed to be given by 
the Rotne-Prager-Yamakawa (RPY) regularisation of the 
Osecn function 



i2{Y) = Qi8 + Q2^ 



(4) 



where for r > 2a, the branch A of the RPY functions i7i 
and l?2 is given by, respectively. 
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while for < r < 2a, the branch B of the RPY functions 
i?! and ^2 is given by, respectively. 
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9 r 
32a 
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3 r 
32a 



(6) 



We introduce the notation of the two branches A and 
B for facilitating subsequent discussion. The quantity a 
has been introduced here as the non-dimensional radius 
of the bead as an additional independent parameter. It 
is related to the conventionally defined [48, 49] hydro- 
dynamic interaction parameter /i* by a = ^/lih* . As is 
well known [29], the sum X^u ^j^/j ' -^m i^ ^'^- i^) *^'-"^" 
verges slowly since D^i^ is long-ranged in nature, scaling 
as l/r. The problem of slow convergence can be resolved 
through the use of the Ewald sum, as discussed in greater 
detail below. It is worth noting here that it is sufficient 
to evaluate ^ D^^ • F^ in order to determine the time 
evolution of r^(t). It is not necessary to know D^^ ex- 
plicitly. Further, as will be seen later, the evaluation of 
^ B^^ • AWp using a Chebyshev polynomial approx- 
imation for Bjy^, also requires a repeated evaluation of 
the Ewald sum. 



B. Evaluation of '^ D,y;j • F^j as an Ewald sum 

Beenakker's [29] representation of the sum ^ D^^-F^ 
as an Ewald sum for infinite periodic systems, using the 
RPY tensor to represent hydrodynamic interactions, has 
the form 



Er, ^ /-, 6aa 40a^Q;^\ ^ 

D.,.F,= 1-^ + ^^lF. 



p=i 



5:'5:MW(r.,.„).F,+5:M(^)(k). 
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n A^— 1 



\l^0 



I cos(k • r^) ^ cos(k • r^)Fp 

N . 

- sin(k • r,y) ^ sin(k • r^)F^ \ 



(7) 



where the first and the second sums on the right hand 
side, both of which converge rapidly, are carried out in 
real and reciprocal space, respectively. The first term 
on the RHS is the correction due to self-interactions and 
does not involve any summation. The parameter a de- 
termines the manner in which the computational burden 
is split between the two sums. The vector r^^^„ is defined 
by V 



i^/i,n 



V — Y^ + nL, where n = (n^,, riy, n^) is the 
lattice vector with nx,ny,nz being integer numbers (see 
Fig. 1). The first summation on the RHS of Eq. (7) is 
carried out in the original simulation box and over all the 
neighbouring periodic images. The prime on the summa- 
tion indicates that the lattice vector n = is omitted for 
ly — II. M^ ^(r) is a 3 X 3 matrix (in real space), which 
depends on a and a, and M^ ' (k) is also a 3 x 3 matrix (in 
reciprocal space), which depends on a, a and the volume 
of the simulation box V. The expressions for M^ '(r) and 




FIG. 1. (Color online) Periodic boundary conditions in 2-D: 
demonstration of the distance vector between two beads 



M(^)(k) are 
M(^^(r) = 
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with erfc denoting the complementary error function, and 
a^k^\ ( e fc4 



M(2)(k) = 



\k''v)'''^^\^o? 



5 kk 



(9) 



The second summation in Eq. (7) (denoted here as the 
reciprocal space sum) is carried out over lattice vectors 
k — 2-KwlL. In Eq. (8), r and f are the magnitude and 
unit vector, respectively, in the direction of r. In Eq. (9), 
k and k are the magnitude and unit vector, respectively, 
corresponding to k. 



C. Modification of the Ewald sum to account for 
overlapping beads 

As pointed out earlier, the derivation of the Ewald sum 
by Beenakker [29] is valid only for the branch A of the 



RPY functions fi\ and fl^ (Eq. (5)), which forbids its use 
for the case of overlapping beads (r < 2a). The original 
expression consequently cannot be used for the simula- 
tion of B solvents by neglecting excluded- volume interac- 
tions, as in this case beads on the same or on different 
chains are highly prone to overlap with each other. The 
Ewald sum has been modified here to account for such 
situations. 

Starting from a given bead v, we consider all those 
beads that have distances less than 2a from it, includ- 
ing bead v itself. By a proper re-labeling, we can as- 
sume that these are the beads [i— 1, . . . , N* . The num- 
ber of non-overlapping particles is thus iV — N* . As 
the correction needs to be carried out only in the real 
space sum, the first summation on the RHS of Eq. (7) 

(En E^=i M(^nr.;.,n) • F^) is replaced by 



N' 
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^MW(r.^,„).F, 
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n pi=N' + l 



M^'Hr.^.n) • F, 



(10) 



where the first summation of Eq. (10) is carried out only 
over overlapping particles in the original simulation box 
(n = 0). Similar to Beenakker's [29] derivation of the 
M' ' matrix, the matrix Mb is derived here based on 
the branch B of the RPY tensor given by Eq. (6). The 
second summation is carried out over the periodic images 
of the overlapping particles (whose distances are more 
than 2a) , and the third summation is similar to that given 
in Eq. (7) but here it is carried out only over the non- 
overlapping particles. Note that the second sum is not 
carried out in the original box. In order to make this sum 
extend over the original box and periodic images, a term 
is added and subtracted as follows 

N* N' 

115^0^=1 






N* 



J2 Wl''nr.M.n=o) • F^ - 5] M(i)(r,^^„=o) • F, 
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+ J2 J2 MW(r.M,n)-F^ 

n fj,=N' + l 

The second, third and fifth summations of the above 
equation together represent the original real space sum 
in Eq. (7). Eq. (11) can consequently be rearranged as 

f^'f MW(r.„„).F,) 

\ n M=l / 



N' 



fi,ii—0 J 



(12) 






where the second summation is carried over ah overlap- 
ping particles in the original box. Denoting the second 
term in Eq. (12) by M*, it is straightforward to show that 



M*(x) 



1 /3cc2 ^ 



1 
2^ 



'ix^ 



-1 



(13) 



where x = r(,y;^,n=o)/Q^ a-nd x is the unit vector in the 
direction of r^^^ jj^o)- The modified form of the Ewald 
sum that is valid for arbitrary inter-particle distance is 
consequently 



1, Choice of Ewald parameters 

At fixed monomer bulk concentration c, the box size 
increases as N^/^ . As can be seen from Eqs. (7) and (8), 
the convergence of the real space sum depends on the 
complementary error function erfc(ar), where r is the 
distance between a pair of beads. In practice, the sum 
is evaluated only for r < r^, where Tc denotes a cutoff 
radius. The value of a is chosen such that erfc(Q;rc) is 
small. At large values of the argument, erfc(Q;rc) behaves 
as exp(— a^Tc^). If we specify M such that exp(— M^) is 
very small, then 
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M/r, 
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Similarly the rate of convergence of the reciprocal space 
sum is controlled by the factor exp(— fc^/4a^). If it is 
required [42] that the accuracy of the real space sum is 
roughly equal to that of the reciprocal space sum at the 
reciprocal space cutoff, K , then using Eq. (15) we find 



M^ = K^/{'ia^) 01 K = 2aM = 2M^/r 
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D. Implementation of the Ewald sum 

As discussed earlier, the real space sum is carried out 
over all the periodic images while the reciprocal space 
sum is performed only in the original simulation box. 
There are three parameters which control the accuracy 
of both the real and reciprocal space sums: Umax, an nation of terms of the form exp(ik • r^). The method 



These relations allow us to specify a and K for given 
values of M and r^ while the latter parameters control 
the accuracy and speed of the algorithm, as discussed 
subsequently. 



2. The real space and reciprocal space sums 

Locating all pairs of beads which are separated by less 
than the cutoff distance Tc is the first step in evaluating 
the real space sum. A naive all-pairs neighbour search 
results in 0{N'^) performance and therefore the link-cell 
method, which is a cell-based neighbour search method, 
is used to improve the performance [50-52]. The calcu- 
lation of the infinite real space sum is thus reduced to 
the calculation of the sum locally over only a small num- 
ber of neighbouring beads. Here, the neighbour search is 
implemented with cells of side rc/5. 

The reciprocal space sum is more straightforward to 
implement. The major effort is expended in the eval- 



integer which defines the range of the real space sum 
(governed by the number of periodic images, see Fig. 1), 
kmaxi an integer that defines the summation range in re- 
ciprocal space and the Ewald parameter a. These three 
parameters are related to each other from the point of 
view of accuracy and speed. A large value of a makes 
the real space sum converge faster (since a smaller value 
of Umax is required). However, this leads to the recipro- 
cal space sum requiring a larger number of wavevectors 
kmax- On the other hand, a small value of a implies an 
expensive real space sum but a cheaper reciprocal space 
sum. The optimal choice of these parameters has been 
discussed previously by Fincham [42] in the context of 
electrostatic interactions. Here, a similar study is per- 
formed for hydrodynamics interactions. 



adopted here precomputes the components of these fac- 
tors by recursion and stores them [42] . This avoids calling 
the complex exponential function repeatedly. However, 
it involves a substantial amount of computer memory. 



III. 



OPTIMISATION OF THE EVALUATION 
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The Ewald parameter a, which splits the computa- 
tional burden between the real space sum and the re- 
ciprocal space sum, is related to the real space cutoff r^ 
by Eq. (15). The aim of optimisation is to minimise the 
total execution time (which is the sum of the real space 
execution time, Tr and the reciprocal space execution 
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FIG. 2. (Color online) Execution time scaling for the real space and the Fourier space sums for: (a) Constant Vc (b) Constant 



time, Tp), with respect to the real space cutoff r^- Fol- 
lowing Fincham [42] , the execution time Tn is calculated 
as follows. A sphere of cutoff radius Tc contains on aver- 

age iVr^ = ~5~ '^c c beads. Each bead interacts with the 

o 
A^r^ beads that surround it. Since for symmetry reasons, 

each pair interaction needs to be considered only once, 

the execution time is 



1 . 47r o 
TR = -AnN — 4ctr 



(17) 



where An is a constant that depends on the code ar- 
chitecture and tr is the execution time to evaluate one 
interaction, which is found to be 0.15 fis when the BD 
code is run on a 156 SGI Ahix XE 320 cluster. Eq. (17) 
is fit to data obtained by running simulations for a range 
of parameters on a 156 SGI Altix XE 320 cluster, and 
the fitted parameter An is then found to be 5. 

The execution time Tp is evaluated as follows. Within 
the cutoff K, the volume of the reciprocal space is 

47r , 47r 8M^ 

-^— (the latter follows from Eq. (16)). The 



-K^ 



3 3 

reciprocal space points are defined by k 



— (t, m, n) 
Ij 
where l,m,,n are integers and L is the simulation box 

size. The volume of reciprocal space per point is, thus. 



(27r/L)^ and 



is the number of points in the 



3 r3 87r3 

cutoff sphere. Using L^ — N/c to highlight the A^ depen- 
dence at fixed concentration c, the number of reciprocal 

AwM^ N 

space points in the cutoff sphere becomes 5 5- . It 

3 n-^ cr^ 

is worth pointing out that for fixed cutoff radius, the 

number of /c-space points increases as A^, because the 

concentration of points in reciprocal space increases with 

system size. Further, inversion symmetry of reciprocal 

space halves the number of reciprocal space points men- 



tioned above. A sum over the A^ beads must be per- 
formed for each fc-space point, so the execution time is 



TF = Ai 



1 47r M^ N'^ 

2 3 TT'^ cr^ 



■tf 



(18) 



where Ap is a code architecture constant and tf is the 
execution time to evaluate one term in the sum, which 
is found to be 0.33 /is. As in the real space instance, 
Eq. (18) is fitted to simulation data to obtain Ap = 0.19. 
The total execution time is consequently 



T ■ 



Tr + Tp 

1 47r 

2 Y 



AB,Nrlctr + Ai 



M^ N"^ 



(19) 



Equation (19) shows that, for fixed M and Vc, Tr varies 
as A^, but Tp varies as A^, because of the increasing con- 
centration of points in reciprocal space. This behaviour is 
demonstrated for c = 4.44 c* and Tc = 10 in Fig. 2 (a) for 
the simulation data (symbols), which agrees with the ex- 
pressions given in Eqs. (17) and (18) (solid lines). Here c* 

[47r 
is the overlap concentration defined by Ni,/ 



-{R'f 



where R'^ is the radius of gyration of a polymer chain in 
the dilute limit. To increase the value of A^, we fix the 
value of beads per chain at A^ = 10 and increase the 
number of chains Nc- Gonversely, if re is increased as 
the system size increases in such a way that rc/L is con- 
stant (as in the approach adopted by Stoltz et al. [10]), 
then since c = N/L^, Tr varies as A^^ but Tp varies as 
A^. Figure 2 (b) displays this behaviour for Ah — 10, 
c — AAAc*, M — 3.3 and L/rc = 3. Once again the 
simulation data is seen to match the expressions given 
in Eqs. (17) and (18). This suggests that by appropriate 
choice of parameters it may be possible to achieve better 
than A^^ behaviour in the total time T. For a given accu- 
racy, the only free parameter is Tc, since this determines 
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. 3. (Color online) (a) Total execution time vs. exponent x for various A'^ (b) Power law scaling for Tr, Tf and Topt 

opt 



at 



a and hence K by Eqs. (15) and (16). To find the value 
of Tc which minimises the total execution time, we set 
dT/dr^ = 0. This leads to 



M (Aptf 



' opt 



-y/TT \ARtr 



1/6 



7VV6 

cl/3 



(20) 



Thus the optimal choice of the cutoff radius (r^) in- 
creases slowly (1/6'^ power) with N . The validity of 
Eq. (20) has been verified by carrying out simulations. 
Assuming that the quantity A^^/^ in Eq. (20) is replaced 
by N^ , various values of the exponent x are selected in 
place of the exponent 1/6, and the total execution time 
for a given value of N is estimated. Fig. 3 (a) shows the 
total execution time as a function of the exponent x and 
it is clear that the minimum execution time is achieved 
when X — 1/6 as given by Eq. (20), for all N. 

Substituting (r^) in Eq. (19) we find for the optimal 
time 

Topt = 2Tr ^2Tf^^ N'-^ ^ ^ARtrAptf (21) 

O IT 

Thus, when the total time is optimised, it is equally di- 
vided between the real and reciprocal space parts of the 
calculation. This is verified in Fig. 3 (b), which displays 
plots of Tji, Tp and T as a function of A^, at a; = 1/6 
and for Nb = 10, c = iAic* and M = 3.3. Symbols 
indicate simulation data and solid lines correspond to 
Eq. (21) with the appropriate values for the various pa- 
rameters. Eq. (21) also indicates that the real space, 
reciprocal space and total time scale as A^-'^-^. Simulation 
results shown in Fig. 3 (b) substantiate this prediction. 
These results are similar to those obtained by Fincham 
[42] in the context of electrostatic interactions. 



IV. DECOMPOSITION OF THE DIFFUSION 
TENSOR 

In component form, the decomposition rule (Eq. (2)) 
for obtaining the block matrix B can be expressed as 
follows, 



N 



EE 

0=1 q=l 



ryrq rysq 
'^U0 ^M/3 



v: 



(22) 



where {i^, /?, ^ = 1, . . . , A^}, {r, q,s = 1, 2, 3}, and 2?^^ is 
the 're'"^ Cartesian component of the tensor D^^. The 
matrix B is not unique. Assuming that B is a lower (or 
upper) triangular matrix leads to its calculation using a 
Cholesky decomposition of "D, which as mentioned ear- 
lier, requires 0{N^) operations. Fixman's [24] approach 
achieves an attenuation of this CPU intensity by recog- 
nising that (i) it is sufficient to find 3 approximately, and 
(ii) the individual columns of the matrix 13 are in them- 
selves not of much interest, only the vector dS = B ■ AW 
is required, where AW is a vector consisting of the 3N 
Gaussian noise coordinates Aiy* with /i = 1,...,A^, 

and s = 1,2,3. By assuming that B = vD, and using a 
Chebyshev polynomial approximation for the square root 
function, Fixman showed that [24-26, 53] 



dS = VV- AW : 



A^Ch-l 

E ^f 

p=0 



V, 



I AW 



(23) 



where A'ch is the number of terms used in the Chebyshev 
polynomial approximation, Cp are the Chebyshev coeffi- 
cients, and the vectors Vp are defined by the recurrence 
relations 



Vp^2y- Vp-i - V 



p-2 



p> 2 



(24) 



with Vo = AW and Vi == ^ • Vo- The linear mapping 



y 



■^max LAmm 



V 



^max i^niin 



(25) 



(where I denotes the 3A^ x 3A^-dinicnsional identity ma- 
trix), ensures that the eigenvalues of y he in the domain 
[—1,1] when the eigenvalues of "D are within [dmin, ^max] • 
This is essential for the validity of the Chebyshev approx- 
imation. 

It is clear that for a given A^ch, the cost of the direct 
calculation of the 3iV-dimensional vector dS, without the 
intermediate calculation of J3, is proportional to Nq^ x 
[the cost of evaluating Vp] . The number of terms A^ch 
that are required is determined by the desired accuracy 
in the estimation of the square root. The choice of iVch is 
also affected by the necessity of ensuring that the bounds 
c^max and dmin Satisfy the following constraints relative to 
the maximum (Amax) and minimum (Amin) eigenvalues of 
X>, namely, d^ax > A^ax and d^in < Amin- 

The CPU cost involved in adopting Fixman's proce- 
dure for dilute polymer solutions have been discussed in 
depth in Refs. 25, 26, and 53. Here, we briefly sum- 
marise the main conclusions: (i) The cost of evaluating 
Vp is simply 0{N'^). (ii) The number of terms in the 
Chebyshev approximation is determined using the ex- 
pression [26, 27] 



A^Ch ~ nint 



An 



An 



where nint is the nearest integer function. The use of 
Eq. (26) is motivated by the finding [24, 25] that the 
value of A^ch required to keep the error in the estima- 
tion of the square root below a fixed tolerance, scales 
as (Amax/Amin)^- (hi) The limiting eigenvalues Amax and 
Amin can be calculated exactly in 0(N'^) operations using 
standard software packages — this procedure was adopted 
in Ref. 25 with the package ARPACK. On the other hand, 
Kroger et al. [26] and Prabhakar and Prakash [27] avoid 
explicit evaluation of the eigenvalues, but instead obtain 
approximate estimates for Amax and Amin- In particular, 
Prabhakar and Prakash [27] use the following expressions 
based on a suggestion by Fixman [24] 



A: 



A 



1 



ixman 
lax 


= 3^^^ 


ixman 
lin 


^3^^^ 



■ vw 



■DU 



and 



(27) 



where W^ is a 3A^-dimensional vector, all of whose ele- 
ments are equal to 1 and 14^ is a 3A^-dimensional vec- 
tor with alternating I's and — I's. Further, the bounds 
rfmax = 2A^™^" and dmin = O.SA^™-" were chosen 
to satisfy the conditions on the magnitudes of dmax and 
c^min relative to the maximum and minimum eigenvalues. 
Since for dilute polymer solutions, (Amax/Amin) ~ Af°'^ 
and consequently A'ch "^ N^/^, the CPU-time require- 
ment for the calculation of dS in Fixman's method scales 
as TVchAf' - N^'^. 



In the case of semidilute polymer solutions, since deter- 
mining Vp requires the recursive evaluation of the prod- 
uct of a linear transformation of the diffusion tensor with 
various SA^-dimensional vectors (see Eq. (24)), the Ewald 
sum can be used for its evaluation, with the force vec- 
tor F^ in Eq. (7) replaced by the relevant vector in the 
Chebychev recursive relationship Eq. (24). Thus the cost 
of evaluating Vp is identical to the cost of carrying out 
the Ewald sum. With the optimisation introduced here, 
this would imply a cost that scales as 0{N^-^). The is- 
sues of determining the number of terms A^ch, and the 
maximum and minimum eigenvalues of 1?, must also be 
addressed before the total cost of Fixman's procedure in 
the context of semidilute solutions can be estimated. 

As pointed out earlier, it is not necessary to know the 
diffusion matrix X> explicitly in order to describe the con- 
formational evolution of polymer molecules in a semidi- 
lute solution. However, since Beenakker [29] provides an 
expression for the periodic diffusion tensor D^p in his 
original derivation, it can used to determine the exact 
values of the maximum and minimum eigenvalues, de- 
noted here by Amax* and AJ^^'^' . By comparing the values 
given by Eqs. (27) with the exact values (obtained with 
the gsLeigen_symm subroutine of the GNU Scientific Li- 
brary) , we find that the behaviour for our semidilute sys- 
tem is quite different from what is known for single-chain 

'" is a reason- 



simulations: While in the latter case, \^'^^ 
able approximation to AJ^a^' (meaning that it scales in 
(26) the same way with the number of beads, with a constant 
ratio of order unity) , we here find that Am™**" is essen- 
tially independent of A^, while AJ^fJ* increases with A^, 
roughly like N^-^ . In other words, Eq. (27) provides only 
a poor approximation. The reason why the behaviour 
is so different for dilute and semidilute systems is not 
clear to us; we speculate that it might have to do with 
the different density distributions of segments. Neverthe- 
less, we can still use Am™^" for estimating the maximum 
eigenvalue, since we empirically find, for a range of val- 
ues of c/c* , a, Nh and A'c, and for a variety of polymer 
conformations, the relation 



\l 



0.35A^°-6A^™^" 



(28) 



which is therefore used to estimate Amax, assuming that 
it is valid throughout. Similarly, we find empirically that 
the lowest exact eigenvalue is essentially independent of 
the number of segments, i.e. 



\ exact 

max /-1 lVfO.6 



\ exact 



(29) 



where the pre- factor C depends on the values of c/c*, 
a and A^t, increasing with an increase in a and Nf, and 
decreasing with an increase in c/c*. For instance, for 
c == 4.44c*, a = 0.5 and Nb = 10, we find C == 8. 
It follows that in the course of simulations a fairly ac- 
curate estimate of the minimum eigenvalue can be ob- 
tained, once Amax is determined, by using the expression 



An 



Amax/(CAf°'^)- In general, the value of C is 
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obtained by trial and error. Once Amax and Amin are de- 
termined by this procedure, we find that it is adequate to 
use the bounds rfmax — Amax and dmin = Amin to ensure 
a robust implementation of the Chebychev polynomial 
approximation. 

With regard to the number of Chebyshev terms, we 
find that for scmidilute solutions, as in the case of dilute 
solutions, the value of A^ch required to keep the error 
in the estimation of the square root below a fixed toler- 
ance, scales as (Amax/Amin)'- This immediately suggests 
from Eq. (29) that the CPU-time requirement for the 
calculation of dS for semidilute solutions using Fixman's 
method scales as A^ch-^^'^ ^ N^-^. Equation (26) is used 
here to provide an initial guess for iVch, which is then in- 
crementally increased until the relative error Ej in the 
estimation of the square root function, given by the fol- 
lowing expression suggested in Ref. 25, 



Ef 



{B ■ AW) • (B • AW) - AW • X> • AW | 
AW • T> ■ AW 



1/2 



(30) 



is less than a specified tolerance. In practice we find that 
the choice of C affects the efficiency with which the final 
value of Nch is obtained. 



V. OPTIMISATION OF EACH EULER TIME 
STEP 

The implementation of the Euler algorithm used here 
to determine the configurational evolution of the system 
requires the estimation of the drift term ^ D^^ • F^ 
and the diffusion term ^ B^,^ ■ AW^ at each time step, 
since the algorithm proceeds by evaluating the right hand 
side of Eq. (1) for each bead v in the original simulation 
box. As mentioned earlier, determining the latter sum in- 
volves the repeated invocation of the Ewald sum. Since 
the spatial configuration of the system is frozen in a sin- 
gle time step, all terms in the Ewald sum that are either 
(i) constant, (ii) only dependent on the reciprocal space 
vector k, or (iii) only dependent on the spatial config- 
uration, do not have to be repeatedly evaluated. As a 
result, it becomes necessary not only to discuss the opti- 
mal evaluation and scaling with system size of the drift 
and diffusion terms individually (as we have in Sees. Ill 
and IV), but also to consider the overall optimisation of 
each Euler time step. 

It turns out that there are two ways in which this op- 
timisation can be carried out. In order to give a flavour 
of the issues involved, we only discuss here the treatment 
of the term involving M^ (rjyp,n) in the Ewald sum (see 
Eq. (14)). The remaining terms are either treated simi- 
larly, or entail a more straightforward treatment. Clearly, 
the term involving M' (fiy^^.n) is first evaluated when the 
drift term ^ D^fj_ ■ F^ is evaluated. Subsequently, it is 
required each time the term X> • Vp ; p = 0, . . . , Nch ~ 1 
is evaluated in the Chebychev polynomial approximation 
(see Eqs. (24) and (25)). For ease of discussion, we denote 



all periodic images. The nature of (r, 
value of Nr. 



by V^ the 3N components of a typical vector Vp. Then 

the term involving M^ (ri/;^,n) in the implementation of 
the Chebychev polynomial approximation can be writ- 
ten as J2'n E^=i ELi ^Ikn'^t^ where M;^ „ represents 
the 're'**^ Cartesian component of the tensor M^ (r^^^n)- 
Before discussing the two methods of optimisation used 
here, it is worth noting the following point that is com- 
mon to both methods. For each bead v, in any periodic 
image n, the sum over the index /i is carried out only 
over the nearest neighbours of bead v, i.e., over the iV^^ 
particles that lie within a sphere centred at bead v with 
cutoff radius Tc. The choice of re, however, is different in 
the two schemes, as detailed below. 

In the first method of optimisation, which we de- 
note here as HMA (for "High Memory Algorithm"), the 
iN X 3A^ matrix S*^^ — J2n^ufi,n is calculated once and 
for all and stored in the course of evaluating the drift 
term ^ 0,^^ • F^. Note that the cost of evaluating S'^^ 
scales as 0{N x N^J since in each periodic image n, only 
the beads fi whose distance from bead v is less than a 
cutoff radius (rj^'^ ) are considered in the sum over 

HMA) and the 

/ opt 

in this context, is discussed in more detail 
below. It should be noted that the matrix SI," becomes 
increasingly sparse when the system size is increased. 
While it is therefore possible to save memory by sparse- 
matrix techniques (meaning in practice the construction 
of a Verlet table [52] and making use of indirect address- 
ing), this was not attempted here, i.e. we stored the ma- 
trix with a simple 0{N^) implementation. Subsequently, 
each time the term T> ■ Vp ; p = 0, . . . , Nch — 1 is calcu- 
lated in the Chebychev polynomial approximation, the 

0{N'^) matrix multiplication Eu=i Es=i '^I-u'^^u i'' *^^^" 
ried out. Again, a sparse matrix implementation might 
be able to reduce this computational complexity. Ulti- 
mately this term dominates and the total CPU cost of 
this scheme scales as 0(iVch x N"^)- For systems that 
are not sufficiently large, the CPU cost might lie in the 
crossover region between 0{N x iV^J (the cost for the 
deterministic drift) and 0(A^ch x A^^). 

The reason that ir^^^^ , is different from the cutoff 

V c y opt 

radius (rjf ) (calculated earlier for just the evaluation 
of the Ewald sum) is because in the HMA algorithm a 
different procedure is used in the repeated evaluation of 
the Ewald sum, with certain quantities being calculated 
once and for all and stored. By repeating the proce- 
dure adopted earlier for optimising the bare Ewald sum, 
namely, by estimating the total time required to evalu- 
ate the various quantities in the real space and recipro- 
cal space sums, we find for the CPU time for one step in 
nanoseconds 

T^^^/ns = 307V(iV - 1) + 1500iV r^c + 2iN'^{2 + Nch) 

+ 0.67 --iV2 (9.2 + 4.27Vch) (31) 

where the constants reflect the various execution times 
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FIG. 4. (Color online) CPU time scalings for (a) HMA (b) LMA. Symbols represent simulations results, and the lines are 
analytical estimates for the total time. 



for individual terms on a 156 SGI Altix XE 320 cluster. 
Minimising this with respect to the cutoff radius leads to 



that only those beads fi that lie within a cutoff radius 



/'^LMA^ 



opt 



of bead ly are considered in the sum over /i. 



„HMA^ 



M3 



' opt 



[0.257V + 0.12NN, 
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(32) 



It turns out that (r^^^) , > (rf) ^. This is because a 

V c /opt V c y opt 

major part of the real space calculation of ^^ M^ (ri/^.n) 
and M*(ryp.n = o) is not repeated A^ch times in the 
HMA, leading to a cheaper real space implementation. 
As a result, the optimisation procedure allows the HMA 
to attribute a greater computational load to the real 
space sum relative to the reciprocal space sum by having 
a larger cutoff radius than (rf) . In contrast to the 

bare Ewald sum, where Ni-^ ^ iV*''^, we find empirically 
that in the HMA, N,^ - N^-'^ . 

It is clear from Fig. 4 (a) that the CPU time for HMA 
scales as Oi^N"^'^) when the simulation is run at a cut- 
off radius of (r^^^) ^, with the empirically estimated 
exponent 2.1 lying in the crossover regime discussed ear- 
lier. Figure 4 (a) also indicates that the CPU cost is 



Note that for each bead /i, the sum over s involves a 
simple (3 X 3) X (3 X 1) matrix multiplication, (ii) The 
sum X^„r^„ over periodic images n is then performed 
to obtain the required quantity in the Ewald sum. 

Since, even in the LMA, some quantities are stored 
during the evaluation of the drift term ^ D^^ • F^, we 
can optimise the entire process involved in executing one 
time step in the Euler algorithm by choosing the appro- 
priate cutoff radius. Adopting the procedure described 
earlier, we find for the CPU time per step in nanoseconds 



T 



LMA 



/ ns = 1200 r^cN {I + Nch) 



M^ 



-iV2(1.8+1.5iVch) 



{r^m.^^ 



AP 



0.18N + 0.15NN^ 



Ch 
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(33) 



(34) 



greater when 



' opt 



is used in the HMA, confirming the Figure 4 (b) compares the CPU cost involved when ei- 



necessity to optimise the total procedure for evaluating a 
single Euler time step rather than using the cutoff radius 
obtained from the Ewald sum optimisation. 

The major difference from the HMA, in the alternative 
method of optimisation used here (denoted by LMA for 
"Low Memory Algorithm"), is the treatment of the sum 

EnE^=iELi^^^M,n"I^M- ^^^'^^ many quantities, such 
as those that are constant, or only functions of the re- 
ciprocal space vector k, are still calculated and stored 
once and for all when the drift term ^ D^^ • F^ is 
evaluated, the 3N x 37V matrix S'" is not stored. In- 
stead, the following two steps are carried out: (i) For 
a given bead index i' and periodic image n, the quan- 
tity T^,n = E^=i ELi ^^M.n'I^M ^^ evaluated, ensuring 



y,LMA\ 



opt 
LMA\ 



' opt 



and 



) ^ is used in the 

/ opt 

(''^)opt are nearly 



ther the cutoff radius ( 
LMA. The reason that 

equal to each other is because practically all the time 
consuming parts of the Ewald sum are calculated repeat- 
edly TVch times in the LMA. As a result, in contrast to 
the HMA, the saving achieved by storing some quanti- 
ties does not make a significant difference to the choice 
of cutoff radius. 

Unlike the HMA, there is no large storage requirement 
in the LMA, as shown in Fig. 5 (a), where it is seen to 
scale as 0{N) for sufficiently large iV. Further, the CPU 
cost scales as 0{Nch x N x N^^), which is identical to 
the scaling for the basic Ewald sum, namely, 0{N^-^) as 
can be seen in Fig. 5 (b). On the other hand, since S"^^ 
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FIG. 5. (Color online) Comparison between HMA and LMA for (a) Computer memory requirement (b) CPU time required for 
a single time step computation 



is not stored, the components M^^ „ are repeatedly eval- 
uated in each of the recursive Chebychev calculations. 
This extra calculation leads to a large pre-factor in the 
scaling of the CPU with N. However, at iV -- 35000 a 
crossover in the CPU cost can be seen to occur, suggest- 
ing that it is advisable to use HMA below a system size 
of roughly 35000, while the LMA would be cheaper for 
larger systems. 



VI. 



TESTING AND VERIFICATION 



The optimised BD algorithm developed here is vali- 
dated by testing and verification under both 0-solvent 
and good solvent conditions. In the former case, we first 
check to see if static equilibrium properties, namely, the 
radius of gyration and the end-to-end vector, agree with 
known analytical results. Secondly, the current imple- 
mentation of the Ewald sum for hydrodynamic interac- 
tions (which enables its use even in simulations that do 
not incorporate excluded-volume interactions) is tested 
by comparing the prediction of the infinite dilution equi- 
librium self-diffusion coefficient, which is a dynamic prop- 
erty, with the results of a BD simulation of single chain 
dynamics. As mentioned in Sec. I, we have recently quan- 
titatively compared the predictions of the explicit solvent 
LB/MD method with the predictions of the implicit sol- 
vent BD method for the dynamics of a single chain under 
good solvent conditions in the dilute limit [21]. A natural 
follow up of the development of the current BD algorithm 
is to compare the two methods at finite concentrations 
under good solvent conditions. Here we extend our ear- 
lier study by comparing the predictions of the radius of 
gyration, the end-to-end vector, and the self-diffusion co- 
efficient. This serves both to verify the predictions of 



the current algorithm in a regime where there are no 
analytical predictions, and to obtain an estimate of the 
relative computational costs of the two mesoscopic sim- 
ulation methods in the semidilute regime. 

The mean-square end-to-end distance is given by 

{Rl) = {{r^,^r,f) (35) 

while the mean-square radius of gyration is given by 



(^i)=^j:j:i^i) 



(36) 



b fj_=i u=l 



with r^^ = jr^ — r^j being the inter-particle distance. 
The long-time self-diffusion coefficient is calculated by 
tracking the mean-square displacement of the centre of 
mass Yr of each chain 



Dj 



lini ' 



|rcft)~r,(0)|^ 
6i 



(37) 



The predictions of the radius of gyration, the end-to- 
end vector, and the self-diffusion coefficient by the cur- 
rent algorithm under ^-solvent and good solvent condi- 
tions, and their verification by various means, are dis- 
cussed in turn below. 



A. ^-solvents 

The mean-square end-to-end distance and the mean- 
square radius of gyration at equilibrium were obtained 
by carrying out simulations of bead-spring chains with 
Hookean springs, using A^f, = 20 and 40 and a fixed num- 
ber of chains Nr = 20. The non-dimensional bead-radius 
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FIG. 6. (Color online) Validation of static properties under ^-conditions: (a) Mean-square end-to-end distance (R^) (b) Mean- 
square radius of gyration {R^). Symbols indicate simulation data, while solid lines represent the analytical results given by 
Eqs. (38) and (39). 



a was chosen to be 0.5, and a time step At = 0.01 was 
used to carry out the Euler integration. A range of con- 
centrations from 3 X 10"'* c* to 3 c* were considered, with 
the concentration being varied by changing the size of the 
simulation box L. Since the chains are free to cross each 
other for ^-solvents, static properties such as the end-to- 
end distance and the radius of gyration are independent 
of concentration. Further, as is well known, their depen- 
dence on Nb can be shown analytically to be [49] 



0.06 



(Rl) - 3im - 1) 



and 



{Rl) - 



Nl-l 

2Nb 



(38) 



(39) 



Note that c* can be determined once a choice for N^ is 
made. 

Figures 6 (a) and (b) display the results for {R^) and 
(i?g), respectively, for the range of concentrations consid- 
ered here. Symbols indicate simulation data, while solid 
lines represent the analytical results given by Eqs. (38) 
and (39). Clearly, the simulated static properties are in 
good agreement with analytical predictions. 

The single chain diffusion coefficient in a dilute solution 
under 0-solvent conditions is used here as the benchmark 
for verifying the current implementation of the Ewald 
sum. The value of the diffusion coefhcient for N^ — 20 
and a — 0.5 is displayed as the solid line in Fig. 7, ob- 
tained here by a conventional BD simulation algorithm 
that uses a semi-implicit predictor corrector scheme de- 
veloped in our group for simulating a single chain that is 
not confined in a box [27]. For the same set of input sim- 
ulation parameters, the long-time diffusivity is obtained 
from the current multi-particle BD algorithm for a range 
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FIG. 7. (Color online) Long-time self-diffusion coefficient un- 
der ^-solvent conditions. Symbols indicate simulation data 
obtained with the current multi-particle algorithm, while the 
solid line represents the value obtained by simulating the dy- 
namics of a single chain in a dilute solution. The circle symbol 
on the j/-axis is the value obtained by extrapolating the finite 
concentration results to the limit of zero concentration. 



of concentrations. It is clear from Fig. 7 that the sim- 
ulated data (symbols) for the diffusivity Dl approaches 
the single chain result in the limit of zero concentration. 
The value of Dl at c/c* = was obtained by fitting the 
values at c/c* — 0.001, 0.003 and 0.01 with a second order 
polynomial and extrapolating to zero concentration. 
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TABLE I. Comparison of predictions of the radius of gyration, the end-to-end vector, and the self-diffusion coefficient by the 
explicit solvent LB/MD method with the predictions of the implicit solvent BD method, for a bead-spring chain with Nt — f 
at three different concentrations, in a good solvent. Note that all properties are given in BD units, except the box size L, and 
concentration c, which are given in LB units when reported for the LB/MD simulations. Both L and c are identical in both 
methods when reported in the same unit system. Note that the highest concentration corresponds to melt-like conditions. 



Nc 


Method 


L 


c 


c/c* 


(Rl) 


(Rl) 


Dl 


20 


BD 


24.152 


0.0142 


0.546±0.001 


111.37±0.47 


18.36±0.04 


0.0272±6xl0"* 




LB 


10 


0.2 


0.543±0.005 


112.93±0.27 


18.29±0.02 


0.0268±8xl0-'' 


32 


BD 


21.737 


0.0311 


1.199±0.002 


98.35±0.43 


16.6±0.04 


0.0162±6xl0"* 




LB 


9 


0.439 


1.192±0.011 


99.11±0.36 


16.59±0.03 


O.OlSlilxlO""^ 


70 


BD 


21.737 


0.068 


2.623±0.004 


76.07±0.53 


13.33±0.05 


0.0024±1 xlO^* 




LB 


9 


0.96 


2.607±0.025 


77.29±0.35 


13.42±0.04 


0.00245±5xl0-=' 



B. Good solvents 

In order to carry out a quantitative comparison be- 
tween the LB/MD and BD methods, it is necessary to 
ensure that the underlying polymer model is identical for 
both the methods, and to map the input parameters of 
the hybrid model onto the input values of the BD model. 
A detailed discussion of how this can be achieved in the 
context of dilute solutions has been given in Ref. 21. Ex- 
actly the same procedure has been adopted here. Essen- 
tially, a bead-spring chain with FENE springs is used, 
with a Weeks-Chandler-Andersen potential, which acts 
between all monomers, employed to model the excluded- 
volume (EV) effect. While in the LB/MD simulation 
approach, the Weeks-Chandler- Andersen parameters are 
used to define the units of energy, length, and time, the 
corresponding units in the BD simulations have been dis- 
cussed earlier in Sec. II A. We refer readers to Ref. 21 for 
details of the length and time unit conversions between 
the two methods. The comparison of the two methods 
proceeds by first picking the simulation parameters for 
the LB/MD model, using these for the LB/MD simula- 
tions, then converting them to BD units using the proce- 
dure outlined in Ref. 21, and finally running the equiva- 
lent BD model obtained in this manner. In other words, 
the two units systems are maintained in the respective 
methods, and a comparison of predicted quantities car- 
ried out a posteriori. 

The results of carrying out this procedure for (i?^), 
{Rp and Dl are shown in Table (I) for Nb ^ 10 at 
three different concentrations. It is worth noting that, 
since EV interactions are short-ranged, we have imple- 
mented a neighbour-list in the current BD algorithm for 
computing the pairwise summation of EV interactions, 
with a cutoff radius equal to the range of the Weeks- 
Chandler-Andersen potential. All values are reported in 
BD units, unless specified otherwise. We find it conve- 
nient to maintain the same absolute concentration in the 
two methods rather than the same c/c*, as this would 
entail an interpolation procedure. In the BD method, c* 



is determined from (i?^) in the dilute limit, by carrying 
out a single chain simulation for parameter values that 
are identical to those in the multi-particle BD simulation. 
In the LB/MD method, simulations are carried out for 
three box sizes, L = 12, 17 and 21, with the number of 
monomers held fixed (we use Nc = 20 and iV;, = 10). As 
a result, the monomer concentration decreases with in- 
creasing box size. The values of (i?g) obtained for these 
three box sizes are extrapolated to infinite box size in or- 
der to determine (i?^) (and consequently c*) in the dilute 
limit. 

It is clear from Table (I), both in the dilute limit, with 
regard to values of c/c* , and at all three finite concentra- 
tions, with regard to values of {RD, {R^) and Dl, that 
there is excellent agreement between the two mesoscopic 
simulation methods, since all properties agree with each 
other within error bars. This validates the current algo- 
rithm in a regime where there are no analytical solutions. 
Further, it demonstrates the robustness of the parame- 
ter mapping technique developed by Pham et at [21] for 
comparing the two simulation methods. 



VII. COMPARISON OF COMPUTATIONAL 
COST WITH LB/MD 

Our recent comparison [21] of the predictions of the 
explicit solvent LB/MD method with the predictions of 
the implicit solvent BD method for the dynamics of a 
single chain indicated that in the dilute limit, BD is the 
method of choice as it is significantly more efficient than 
LB/MD. However, Fig. 8 suggests that for our current 
implementation the situation is quite the reverse at the 
finite concentration, c/c* = 1.2, at which the simula- 
tion data in the figure were obtained. The comparison 
of the two mesoscopic simulation methods displayed in 
Fig. 8 was carried out using the identical procedure devel- 
oped earlier by Pham et al. [21]. Essentially, the LB/MD 
method was run for a total of 100 MD time steps (with a 
step size of 0.01 in LB units, or 0.018 in BD units). This 
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FIG. 8. (Color online) Comparison of the CPU time required 
by the LB and BD systems for a wide range of system sizes 
A^, at concentration cjc^ = 1.2, for the equivalent of one LB 
time unit. 



amounts to a total simulation time of one time unit in 
terms of LB units. The BD algorithm was then run for 
the same length of physical time, by converting one LB 
time unit to BD time units. The BD algorithm required 
a significantly smaller time step of 10^^ in BD units. 
The reason for this choice is because the current imple- 
mentation uses a simple Euler integration scheme, with a 
rejection algorithm that ensures that none of the springs 
in any of the bead-spring chains exceeds the upper limit 
of the FENE spring length v6. In contrast, the earlier 
comparison of the two methods in the dilute limit was 
based on a BD code that uses a semi-implicit predictor- 
corrector method, enabling the use of a much larger step 
size of 50 X 10~^ BD units. The dependence of CPU time 
on system size was examined here by increasing the num- 
ber of chains A^c, while keeping the number of beads in 
a chain fixed at N^, = 10. The concentration was main- 
tained constant at c/c* = 1.2 (or c = 0.031 in BD units) 
by increasing the box size L suitably. Since the difference 
between the HMA and LMA BD algorithms is insignifi- 
cant on the scale of the difference between LB/MD and 
BD, only results for the LMA are shown in Fig. 8. 

The CPU time scaling of the LMA algorithm has been 
established in Sec. V to be N^-^. From Eqs. (33) and (34) 
one immediately sees that after optimisation the CPU 
time depends only on the particle number N , but is in- 
dependent of the concentration c (or the system volume 

vy. 



T 



LMA 



(TV, V) = 7' 



LMAtw-I.S 



(40) 



with some proportionality constant -f^^^. Conversely, 
the LB/MD method is dominated by the CPU effort of 



c 



with another constant 7^^^. Hence 
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LB 



yLB 



cN'> 



r 



(41) 



(42) 



From our CPU timings we find a value 'j-LMA^^lb _ 
1.3 X 10** in BD units, i.e. our current implementation of 
Ewald BD becomes competitive with LB/MD only if the 
concentration is below the very small value 7.8 x 10^^ x 

iV-O-8. 

However, it should be noted that the present version 
is by far not the fastest conceivable BD code. Firstly, 
we expect that by implementing an implicit integrator 
the time step may be increased by nearly two orders of 
magnitude. Secondly, the evaluation of the real-space 
HI should be substantially faster (both in the LMA and 
HMA versions) by making use of Verlet tables. Thirdly, 
the HMA algorithm could then take advantage of sparse- 
matrix techniques (see also the discussion in Sec. V). Fi- 
nally, the evaluation of the Fourier part can be speeded 
up by making use of Fast Fourier Transformation, which, 
as shown previously, gives rise to a complexity of the total 
algorithm of 0(7Vi-3 log TV) [38, 39]. Ah together, achiev- 
ing accelerations by up to three orders of magnitude does 
not seem unrealistic. 



VIII. 



SUMMARY 



A range of issues related to the development of an 
optimised BD algorithm for simulating the dynamics of 
semidilute solutions in unbounded domains has been con- 
sidered here. In particular: 

1. It is possible to develop an optimised Ewald method 
for hydrodynamic interactions that splits the cost 
of evaluating the real space and reciprocal space 
sums evenly, leading to a CPU cost that scales as 
N^-^, rather than the iV^ scaling that would result 
from a straightforward implementation. 

2. While Beenakker's original implementation of the 
Ewald sum is only valid for systems without bead 
overlap, it can be modified to account for bead 
overlap, such that 0-solutions can be simulated by 
switching off excluded- volume interactions. To the 
best of our knowledge, this is the first implementa- 
tion of an Ewald sum for the regularised branch of 
the RPY tensor. 

3. As in the case of dilute solutions, the number of 
Chebychev terms required to maintain a desired 
accuracy scales as (Amax/Amin)', where Amax and 
Amin are the maximum and minimum eigenvalues of 
the diffusion tensor T>. It is shown that this leads 
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to an additional computational load that scales as 
7VO-3. 



4. It is necessary to consider the optimisation of the 
overall time required to perform one Euler time 
step, in addition to the individual optimisations 
of the Ewald sum and Chebychev polynomial ap- 
proximation. In this context, two different schemes 
for optimisation have been proposed in the form 
of the "high memory" (HMA) , and the "low mem- 
ory" (LMA) algorithms. While the LMA leads to 
an overall CPU time scaling of A''^'*, which appears 
better than the A^^-^ scaling of the HMA, the large 
prefactor in the former makes it preferable only for 
large systems with more than roughly 35,000 par- 
ticles. 



finite concentration in the semidilute regime. 

6. In contrast to dilute solutions, where BD was shown 
to be significantly more computationally efficient 
than LB/MD [21], exactly the opposite is true for 
semidilute solutions. The CPU cost of the BD 
method scales as A^^-*, while the cost of the LB/MD 
method scales linearly with system size. The neces- 
sity of carrying out an Ewald sum renders the BD 
method developed here significantly more compu- 
tationally expensive than LB/MD. Nevertheless, it 
should be noted that the BD method can be further 
refined and dramatically speeded up, as discussed 
at the end of Sec. VII. 
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